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ENCODING A VIDEO WITH A VARIABLE FRAME-RATE WHILE 
MINIMIZING TOTAL AVERAGE DISTORTION 

Related Patent Application 

This Patent Application is related to U.S. Patent Application Sn., 09/xxx,xxx, 
5 "ESTIMATING TOTAL AVERAGE DISTORTION IN A VIDEO WITH 
VARIABLE FRAMESKIP," filed by Vetro et al. on xxxx. 

Field of the Invention 

10 This invention relates generally to video coding, and more particularly to optimally 
encoding videos according to rate-distortion characteristics of the videos. 

Background of the Invention 

15 A number of video coding standards support variable frame rates, e.g., H.263 and 
MPEG-4. With variable frame-rates, any number of frames, or objects in the case 
of MPEG-4, can be skipped during the coding of the output video. That is, the 
skipped frames remain uncoded. With these video coding standards, the encoder 
may choose to skip frames of a video to either satisfy buffer constraints, or to 

20 optimize the video coding process. However, most encoders only skip frames to 
satisfy buffer constraints. Buffer constraints are usually due to bit-rate (bandwidth) 
limitations. The coder is forced to skip frames when insufficient bandwidth causes 
the buffer to fill up. Consequently, it is not possible to add any additional frames to 
the buffer, and these frames remain uncoded (skipped) until there is room in the 

25 buffer to store a new coded frame. This type of frame skipping can degrade the 



1 



MH-5065 
Vetro et al. 

quality of the video because the content of the video is not considered. Note that 
skipping frames effectively reduces the frame-rate. 

It is a problem to provide an optimal strategy for coding a video. Specifically, the 
5 video could be coded at a higher frame-rate having a lower spatial quality, or a 
lower frame-rate having a higher spatial quality. This trade-off between spatial and 
temporal quality is not a simple binary decision, but rather a decision over a finite 
set of coding parameters (constraints). Obviously, the best set of coding parameters 
will yield the optimal rate-distortion (R-D) curve that maximizes the frame-rate 

10 while minimizes the distortion. The two parameters of interest are the number of 
frames per second (fps or frame-rate) and a quantizer (Q) parameter. A higher 
quantizer parameter increases the spatial distortion. Lowering the frame rate, by 
skipping frames, reduces both the spatial and temporal distortion. In the known 
prior art, the distortion is measured only for coded frames, and is expressed as the 

15 mean-squared error (MSE) between pixels in the original video and the 

compressed video. That is, the prior art methods have two problems, only spatial 
distortion in coded frames is considered, and uncoded frames contributing to both 
the spatial and temporal distortion are not considered at all. 

20 Generally, prior art optimized coding methods do not consider the temporal aspect 
of rate-distortion, see H. Sun, W. Kwok, M. Chien, and C.H. John Ju, "MPEG 
coding performance improvement by jointly optimizing coding mode decision and 
rate control," IEEE Trans. Circuits Syst. Video TechnoL, June 1997, T. Weigand, 
M. Lightstone, D. Mukherjee, T.G. Campbell, S.K. Mitra, "R-D optimized mode 

25 selection for very low bit-rate video coding and the emerging H.263 standard," 
IEEE Trans. Circuits Syst. Video TechnoL, and Apr. 1996, J. Lee and B.W. 
Dickenson, "Rate-distortion optimized frame type selection for MPEG encoding," 
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IEEE Trans. Circuits Syst. Video TechnoL, June 1997. Generally, it is assumed that 
the frame-rate is fixed. 

These methods consider optimizations on the quantizer parameter, H. Sun, W. 
5 Kwok, M. Chien, and C.H. John Ju, "MPEG coding performance improvement by 
jointly optimizing coding mode decision and rate control," IEEE Trans. Circuits 
Syst Video TechnoL, June 1997, mode decisions for motion and block coding, T. 
Weigand, M. Lightstone, D. Mukherjee, T.G. Campbell, S.K. Mitra, "R-D 
optimized mode selection for very low bit-rate video coding and the emerging 
10 H.263 standard " IEEE Trans. Circuits Syst Video TechnoL, Apr. 1996, and frame- 
type selection, J. Lee and B.W. Dickenson, "Rate-distortion optimized frame type 
selection for MPEG encoding," IEEE Trans. Circuits Syst Video TechnoL, June 
'li 1997. Such methods can achieve an optimum coding when the frame-rate is fixed, 
'-7 and the bit-rate can be met for the given frame-rate. However, these methods are 
115 less than optimal for varying frame-rates. 

CP It should be noted that the trade-off between spatial and temporal quality, while 
U coding, has been described by F.C. Martins, W. Ding, and E. Feig, in "Joint control 

of spatial quantization and temporal sampling for very low bit-rate video," Proc. 
20 ICASSP, May 1996. However, in their method, the trade-off was achieved 

manually. 

Therefore, it is desired to provide a method and system for encoding a video 
subject to a variable frame-rate, while minimizing the total average distortion. 

25 
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Summary of the Invention 

The present invention optimizes the encoding of a video that allows a variable 
frame-rate. The invention provides a method for determining an average distortion 
5 for coded frames as well as uncoded frames. Using this method in conjunction with 
methods that determine the frame-rate, enables the invention to make an optimal 
trade-off between the spatial and temporal quality in an encoded video that 
optimally minimize the average total distortion, which includes both spatial and 
temporal distortion. 

10 

More particularly, a method encodes a video a video objects. For each candiate 
object, a quantizer parameter and a skip parameter that jointly minimizes an 
average total distortion in the video are determined while satisfying predetermined 
constraints. The average total distortion includes spatial distortion of coded objects 
15 and spatial and temporal distortion of uncoded objects. Then, the candidate objects 
is encoded as the coded objects with the quantizer parameter and the skip 
parameter, and the candidate objects is skipped as the uncoded objects with the 
skip parameter. 

20 Brief Description of the Drawings 

Figure 1 is a flow diagram for encoding a video with variable video-object plane 
(VOP) rates; 

25 Figure 2 is a flow diagram of a method for determining average total distortion in a 
video according to the invention; 
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Figure 3 is a flow diagram of a method for determining optimum rate-distortion 
values while encoding a video with a variable frame rate; 

Figure 4 is a plot comparing the actual and estimated rate-distortion for the 
5 uncoded frames of the Akiyo sequence coded at a fixed frame rate of 30 fps; 

Figure 5a illustrates a constrained case for object-based coding, which shows 
variable VOP-rates of each abject with regular or constrained VOP-skip; and 

10 Figure 5b illustrates an unconstrained case for object-based coding, which shows 
variable VOP-rates of each object with irregular or unconstrained VOP-skip. 

Detailed Description of the Preferred Embodiment 

45 Introduction 

f 1 As shown in Figure 1, our invention provides a method 100 for coding a video 101. 
Moreover, the video 101 is coded with a variable temporal rate for Video Object 
Planes (VOP's), or simply with variable VOP-rates.. Our method determines 110 a 

20 quantizer parameter (0 1 1 1 for each object, and also determines 120 a VOP-skip 
parameter, or simply skip parameter (f s ) 121. The quantizer and VOP-rate 
parameters jointly minimize 130 an average total spatial distortion 131 and a 
temporal distortion 132 in the video, while satisfying predetermined constraints 
133. Then, the object is encoded 141 as a coded object 152 with the quantizer 

25 parameter 1 1 1 and the VOP-rate parameter 121, or skipped 142 as an uncoded 
object 153 with only the skip parameter 121 parameter to minimize the average 
distortion while satisfying the constraints 133. According to the skip parameter 
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121, a coded object 151 at a given time instant is encoded 141 with quantization 
parameter 111. During this process, (f s - 1) uncoded objects 153 are skipped 142. 

It should be noted, that in general, a frame is a specific example of a video object 
5 as defined in the MPEG-4 standard, particularly a fixed-size, rectangular video 
object. However, the invention generally applies to any video object having 
arbitrary variable shape and size. Hereinafter, we occasionally use the more 
familiar term frame to described an exemplary embodiment of any video object. 

10 In addition, the invention can concurrently encode multiple video objects, perhaps 
frames of multiple program streams in a single transport stream, or multiple objects 
in a single program stream, or both. 

Determining Distortion 

15 

Figure 2 shows a method 200 for determining the average distortion due to spatial 
131 and temporal 132 distortion in the video 101. The coded objects 212 and 
uncoded objects 222 are candidate objects to be coded or skipped according to the 
method of Figure 1. 

20 

We denote the spatial distortion 21 1 for coded objects 212 by D c (Q) and the 
spatial and temporal distortion 221 of uncoded objects 222 by D s (Q, f s ) where Q 

represents the quantizer parameter 111, and/, the skip parameter 121, defined in 
greater detail below. In short, a skip parameter equal to 3 means: code every third 
25 object (frame) in a time sequence; a skip parameter equal to 4 means code every 
fourth instance, and a skip parameter equal to 1 means code every object of frame 
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instance without skipping any. In other words, the skip parameter is equal to the 
number of frames that have been skipped at a given time instant, plus one. This 
parameter may change throughout the encoding of video. However, the average 
skip parameter /, » discussed below, may be used to indicate a longer-term effect 
5 regarding implications on the average bit-rate. 



10 



The spatial distortion 21 1 is dependent on the quantizer parameter Q, a spatial 
measure, while the temporal distortion 221 depends on both the quantizer and skip 
parameters. 



Although the average distortion for uncoded objects does not directly influence the 
distortion of coded objects, the first distortion does influence the second distortion 
indirectly in two ways. First, the number of uncoded objects influences a residual 
statistical component, and second, the first distortion influences the quantizer 
15 parameter that is selected. 

It is important to note that the distortion 21 1 for the uncoded frames 222 has a 
direct dependency on the quantization step size in the coded frames 212. The 
reason is that the uncoded frames 222 are interpolated from the coded frames 212, 
20 thereby carrying the same spatial quality, in addition to the temporal distortion 
caused by skipping the frame. 

Given the above, we determine the average distortion over a specific time interval 

( f «.wJ b y. 



25 %,J Q ^=j s 



k=l+\ 



(1) 
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In equation 1, the average distortion over the specified time interval is due to the 
spatial distortion of one coded object at t = t i+fs , plus the temporal distortion of f s - 1 
uncoded objects. The temporal distortion is dependent on the quantizer parameter 
5 for the previously coded object at t = 

Spatial Distortion 

The variance of the quantization error is 
10 al=a-T 2R -al (2) 

where o\ is the input signal variance, R is the average rate per sample, and a is a 

constant that is dependent on the probability distribution function (PDF) of the 
5 input signal and quantizer characteristics, see Jayant et al. "Digital Coding of 

Waveforms," Prentice Hall, 1984. In the absence of entropy coding, the value of a 
315 typically varies between 1.0 and 10. With entropy coding, the value of a can be 

less than 1 .0. We use equation 2 to determine 210 the spatial distortion 21 1 as, 
□ D(<2 ( ) = «-2- 2R(, ' ) -^. (3) 

Equation 3 is valid for a wide array of quantizer parameters and signal 
20 characteristics. Such aspects are accounted for in the value of a. However, as stated 
above, the number uncoded objects can impact the statistics of the residual. In 
general, we have determined that the average bits per object increases for larger 
values of/ s . 

25 However, the variance remains substantially the same. This indicates that the 

variance is incapable of reflecting small differences in the residual that impact the 
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actual relation between rate and distortion. This is caused by the presence of high- 
frequency coefficients. Actually, it is not only the presence of the high-frequency 
coefficients, but also their position. If certain run-lengths are not present in a 
variable length coding table, e.g. Huffman coding, less efficient escape coding 
5 techniques must be used. This probably means thatf s affects the PDF of the 
residual, i.e., the value of a, while holding o\ substantially fixed. 

We ignore any changes in the residual due to the uncoded frames, and use the 
model given by equation 3 to determine the spatial distortion 21 1. A fixed a and 
10 <7 2 determined from the last coded frame is used. 



f Temporal Distortion 

^ To determine 220 the spatial and temporal distortion 221 of the uncoded objects 

^1 5 222, we assume, without loss of generality, that a temporal interpolator of a coder 

U can simply repeat the last coded object. Other interpolators, that average past and 

C future coded objects, or make predictions based on motion, can also be considered. 

As stated above, the distortion due to uncoded frames has two parts: one spatial 
20 due to the coding of the reference frame (last coded frame), and another temporal 
due to the interpolation error. We express the distortion at J* as, 

e k =¥ k -¥ k =¥ k -V, =V k -V.+Vi-V,, ( 4 ) 

v j \ , J 

v v 

wherein y/ k denotes the estimated frame at t = t k , y/ t denote the last coded frame at 
ti < fa, y/ k ~ y/ n and Az ik and ^.represent the frame interpolation error and coding 
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error, respectively. If these quantities are independent, the mean square error 
(MSE) is 

E{e 2 k } = E{A 2 Ci } + E{A 2 z, k }, (5) 
which can be equivalently expressed as, 
5 D s (Q i ^) = D c (Q i ) + E{A 2 zJ, (6) 

that is, the combination 230 of the spatial and temporal distortions. Equation 6 
implies that the components contributing to the spatial and temporal distortion 221 
are additive. However, other combinations may also be considered. 

10 To derive the expected MSE due to frame interpolation, we first assume that the 
frame at time t k is related to the frame at time t { with motion vectors 
(Ax(x,y),Ay(x,y)), 

y/ k (x, y) = y/,(x + Ax(x, y), y + Ay(x, y)) . (7) 

15 In the equation 7, it is assumed that every pixel (x, y) has an associated motion 
vector. In actuality, we approximate the motion at every pixel by having one 
motion vector per macroblock. Then, 

Az tJt =y/M+Ax l , k >y + Ay„ k )-y/ i (x>y)> =^- Ax >* + ^H y "' (8) 

where (i^-,-^*-) represent the spatial gradients in the x and y directions. Note, 
5x dy 

20 this equation is expanded by using a first-order Taylor expansion and is valid for 
small (Ax, Ay ).This is equivalent to an optical flow equation, where the same 
condition on motion is also true. 
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It should be noted that equation 8 is less accurate when the amount of motion in a 
sequence of frames is large. However, for coding applications that estimate the 
distortion to decide if a lower MSE can be achieved with more uncoded frames, the 
accuracy of the motion estimation is not so critical because an optimized encoder 
5 would not skip frames for such sequences anyway. The MSE incurred by skipping 
frames in a sequence with large motion would be very large. 

Treating the spatial gradients and motion vectors as random variables and 
assuming the motion vectors and spatial gradients are independent and zero-mean, 
10 we have, 

| 4f Za }=«,+«.. w 

* where (a 2 ,<7 2 ) represent the variances for the x and y spatial gradients in frame i, 

%\ and ((7^ l , <J 2 Ay t ) represent the variances for the motion vectors in the x and y 

direction. Equation 9 shows that it is sufficient to determine the temporal distortion 
4:15 from the second-order statistics of the motion and spatial gradient. 

\ 1 The model in equation 9 is accurate for low to moderate motion sequences, This is 
sufficient because an optimized coder would not need such an accurate model 
when the motion is high, see U.S. Patent Application Sn., 09/xxx,xxx, 
20 "ESTIMATING TOTAL AVERAGE DISTORTION IN A VIDEO WITH 

VARIABLE FRAMESKIP," filed by Vetro et al. on xxx, and incorporated herein 
in its entirety by reference. 

25 
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Determining Rate 

A quadratic rate-quantizer (R-Q) relationship for a single object at time t = t k can 
be determined by, 



w k )=s k 



v Q k Ql j 



(10) 



where S k is the encoding complexity, often substituted by the sum or mean of 
absolute differences of the residual component, Q k denotes the quantizer parameter 
1 1 1 and Xtf denotes the model parameters that are fitted to the data, see 
T. Chiang and Y-Q. Zhang, "A new rate control scheme using quadratic rate- 

10 distortion modeling," IEEE Trans. Circuits Syst. Video Technol., Feb 1997, 

A. Vetro, H. Sun, and Y. Wang, "MPEG-4 rate control for multiple video objects," 
IEEE Trans. Circuits and Syst. Video Technol., Feb. 1999. Other methods can also 
be used, see H.M. Hang and J.J Chen, "Source model for transform video coder 
and its application - Part I: Fundamental theory," IEEE Trans. Circuits Syst. Video 

15 Technol, vol.7, no.2, pp. 287-298, April 1997. In any case, given the R-Q 

relationship for a single frame, the average bit-rate over time, R is determined by, 



i+F 



R=2,R(t t ) = F.R(t t ), (11) 
where the F is the average frame-rate, and R(t k ) is the average bit-rate per frame. 

20 The parameter that relates the rate and distortion is the skip parameter, 

introduced above. This parameter can change at each coding instant, therefore the 
relation between the skip parameter and the average coded frame rate, F , is 
defined by the average skip parameter, f s , and is given by, 

7s=?§r, (12) 
F 
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where F src is the source frame-rate. For example, if the source-frame rate is 30 fps, 
and the average coded frame rate is 10, then the skip parameter is 3, and only every 

third frame, i.e., (— ),frames is coded. To be clear, f s is a parameter used to 

f r 

quantify the distortion due to skipping objects or frames. In turn, this parameter 
5 affects the values of /, and F , and ultimately relates to the average bit-rate/? . 

Frame-Based Rate Control 

We have described how to determine the frame-rate for the coded frames, or 
310 generally video objects, and the average distortion over a given time interval for 
j the coded and uncoded objects. We now describe a rate control method that 

minimizes the average distortion, subject to constraints on the overall bit-rate and 
I buffer occupancy. Formally, we express the method and its three constraints by, 
J argmin^^^^^jCa,,,/,) (13) 



115 s.t. 



R<R 

<B+R(t l+f )<B maK 

B i+ R(t i+f )-f s R drain >0 

where R is the target bit-rate, # max is the maximum buffer size in bits, Bi is the 
current buffer level, also in bits, and R dTain is the rate at which the buffer "drains" 
per object. 

20 Informally, we determine the values of the quantizer (Q) 111 and skip parameter 
(/) 121 that minimize 130 the average distortion 131-132, such that the target bit- 
rate, buffer size, buffer level, and drain rate constraints are satisfied. 
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As shown in Figure 3, we determine the minimizing rate-distortion parameters 
131-132 by the following process steps. Let// denote the skip parameter computed 
in a previous coding iteration. We begin encoding the video 101 sequence by 
setting// equal to 1 . This means that the full frame-rate is initially used, and all 
5 frames are encoded. Then, the iterations at each coding instant are as follows. 

In step 3 10, we set the maximum skip parameter as f s = max {1, f,-S}, D mn = °° . 

In step 320, we determine the target number of bits for the object. This value is 
10 mainly dependent on the current value of / and 2f f . 

4 In step 330, we determine the value of the quantizer parameter Q i+f using equation 
J 10. 

5 -J15 In step 340, we determine if the quantizer parameter 1 1 1 and skip parameter 121 
M- still satisfies bit-rate and buffer constraints. If false, then increment 341 the 

£ parameter long as the new/ < min{/+ d,f max } because the current value of/ is 
no longer valid, and iterate the previous steps. 

20 In step 350, we determine the distortion using equation 1 . 

Otherwise, if true, in step 360, we determine if the current distortion is less than 
D min . If false, we proceed with step 341 as described above. If true, replace D min 
with the current distortion and record 370 the encoding parameters/ 111 and 
25 Q. f 121 for this given coding time instant. It should be noted that the parameter S 
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is used to limit the frame-rate from one coded frame to another, similar to the 
known bounding of the quantizer parameter 111. 



Target Bit-Rate and Buffer Control 

5 

Given a candidate value of frame skip/^, a target bit-rate T for a particular object is 
dependent on this value of /„ and the current buffer level B { . An initial target, T x is 
determined according to the number of bits remaining in the video, the number of 
remaining objects, and the number of bits required to encode the last object, see A. 
10 Vetro, H. Sun, and Y. Wang, "MPEG-4 rate control for multiple video objects," 
h IEEE Trans. Circuits and Syst. Video TechnoL, Feb. 1999. The only difference 
Z between this initial estimate and subsequent rates is that the remaining number of 
^ objects are divided by the candidate In this way, a proportionately higher 

number of bits will be assigned to each object when the skip parameter is higher. 

7 After the initial target bit rate has been determined, it is scaled according to, 

1 B +2{B -B) /1/n 

r = r._i_ — — _l! (14) 

2B +(B -B) 

i V max / 7 

where a modified buffer fullness Z? , accounts for the current value of the skip 
parameter, and is expressed as, 
20 B=B-{f-\)-R drain . (15) 

This modification is made to reflect the lower occupancy level as a result of object 
skipping. On contrast, prior art methods do not make this adjustment and the 
scaling operation of equation 14 would force the target bit-rate too low, see 
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ISO/IEC 14496-5:2000 "Information technology - coding of audio/visual objects," 
Part 5: Reference Software. 

If the target bit-rate is too low for lower skip parameter values, the resulting 
5 quantizer parameter is unable to differentiate itself from quantizers that were 
determined at lower skip parameter values. In this case, it is difficult to make the 
trade-off between coded and temporal distortion in equation 1 to ever favor 
skipping objects. 

10 Practical Considerations 

In practical coding applications, where an encoder would estimate the total 
distortion, the main problem is to determine the temporal distortion based on past 
: and current data. For instance, equation 9 assumes that the motion between z, the 
J5 current object, and k, a future object is known. However, this would imply that 

motion estimation is performed for each candidate object to be coded or not, where 
these candidate objects have a time index k. This is impractical. Therefore, we 
assume the motion between objects is linear, and approximate the variance of 
motion vectors by, 

(k-i\ 2 

20 <„=^„-y . 06) 

V Jl J 

where// denotes the number of uncoded objects between the last coded object and 
its reference object. 

Similarly, estimates of the distortion for the next candidate object to be coded, i.e., 
25 the measurement specified by equation 3, requires knowledge of a and o\ , which 
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depends on/ 5 . As mentioned above, motion estimation for every candidate object is 
not performed, therefore the actual residuals are also unavailable. To overcome this 
practical difficulty, the residual for future objects can be predicted from the 
residual of the current object at, i.e., t = t t . 

5 

However, as described above, the relationship between the a, a\ and the uncoded 

objects is not as obvious as the relation between motion and in uncoded objects. 
Also, we have observed that changes in the variance for different numbers of 
uncoded objects are very small. Therefore, we use the residual variance of the 

JO current object at t - U for the candidate objects as well. In this way, changes in D c 
are only affected by the "bit budget" for candidate skip factors. 

f One practical problem to consider is how the equations for the distortion of non- 
coded objects are evaluated based on current and past data. For instance, in its 
current form, equation 8 assumes that the motion between i, the current time 

il5 instant, and k, a future time instant is known. However, this would imply that 

1 motion estimation is performed for each candidate object, k. Because such 

2 computations are not practical, it is reasonable to assume linear motion between 
objects and approximate the variance of motion vectors by, 



20 



V 



(16) 



Similarly, estimates of the distortion for the next object to be coded (i.e., 
calculation of equation 6 requires knowledge of a and o\ , which depends on/. As 

mentioned earlier, motion estimation for every candidate object is not performed, 
therefore the actual residuals are not available either. To overcome this practical 
25 difficulty, the residual for future objects may also be predicted based on the 
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residual of the current object at t = However, as discussed earlier, the 
relationship between the a, a\ and skip is not as obvious as the relation between 

motion and skip. Also, we have observed that changes in the variance for different 
skip are very small. Therefore, we use the residual variance of the current object at 
5 t = ti for the candidate objects as well. In this way, changes in D c are only affected 
by the bit budget for candidate skip factors. 

Frame-Based Results 

, =: 10 Figure 4 shows that our method is accurate for the well know test sequence Akiyo. 
J This sequence is encoded at a number of constant bit-rates using the standard 
l ti MPEG-4 rate control method that is implemented as part of the reference software, 
W ISO/IEC 14496-5:2000 "Information technology-coding of audio/visual objects," 
Part 5: Reference Software. The bit-rates that we consider range from 32 Kbps to 
Ol5 256 Kbps, and the sequences are encoded at a full frame-rate of 30 fps. 

Figure 4 shows that the method according to the invention outperforms the 
i = " reference method. At lowest bit-rates, the difference is almost ldB, while at higher 
bit-rates, an improvement of 0.4db is observed. In the low bit-rate simulations, the 
20 reference method is forced to skip objects due to buffer constraints, whereas the 
proposed method skips objects based on the minimum distortion criterion and rate 
constraints as described above. 

25 
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Object-Based Rate Control 

To achieve gains in videos with areas of larger amount of motion, especially a 
video where the fast motion is localized, e.g., the mouth in the Akiyo sequence, we 
5 prefer an object-based framework. In this framework, different objects are coded 
with different temporal resolutions (video-object-plane or VOP-rates) and different 
quantization parameters. The frame-rate is a special case of the VOP-rate, that is, 
the object is an entire frame. 

10 Similar to the problem statement for the frame-based approach, we minimize the 
1 average distortion over time, subject to constraints on the bit-rate and buffer size. 
'■2 As defined in equation 13, the minimum distortion is determined by jointly 
fij selecting a skip parameter that decides the next frame (object) to be coded, and the 
quantization parameter that is used to actually code the object. 

: ,15 

However, in an object-based framework, we have the freedom to choose different 
skip parameters and corresponding quantization parameters for each video object. 
Although such freedom provides the potential for coding gain, it also complicates 
the problem significantly, because now we must track the individual time instants 

20 that each object is coded. This is necessary because we must allocate bits 

according to a new buffering policy. The new policy may need to account for 
irregular buffer updates based on arbitrarily shaped objects with different 
complexity and size. Furthermore, this must be done to avoid any potential 
composition problems that would be encountered by the decoder. For details on the 

25 composition problem and how it can be avoided, see U.S. Patent Application Sn., 
09/579,889, "Method for encoding and transcoding multiple video objects with 
variable temporal resolution", filed by Vetro et al. on May 26, 2000. 
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In the prior art, see C.W. Hung and D.W. Lin, "Towards jointly optimal rate 
allocation for multiple videos with possibly different object rates,'" in Proc. Int'l 
Sump, on Circuits and Systems, Geneva, Switzerland, May 2000, the problem of 
5 rate allocation for multiple video sequences with different object rates was 
considered. They described the problem only within the context of frame-based 
video coding, where composition problems were not a concern. 

The frame-based problem for multiple video coding was described by L. Wang and 
10 A. Vincent, in "Joint rate control for multi-program video coding," IEEE Trans. 
J Consumer Electronics, vol.42, no.3, pp. 300-305, Aug. 1996. However, the 
Z possibility to have different frame-rates in video sequences was never considered. 
?• From this earlier work, however, the concept of a super-frame is still used. A 
'r. super-frame refers to a set of video objects that are co-located in time. 
U5 

r' Figures 5a illustrates this concept for a constrained case, and Figure 5b for the 
^ unconstrained case. In these Figures, a super-frame is represented by the different 
H video sources that are encapsulated in the dotted lines. For objects in the same 

scene, this term becomes less meaningful because all objects are in one frame. The 
20 method described by Hung and Lin considered both the constrained and 

unconstrained cases to deal with the rate allocation and buffer control problems 

under varying temporal conditions. 

In the constrained case, the delay is dependent on the super-frame period, which is 
25 equal to the time between cycles and can be calculated from the fixed VOP-rates of 
each sequence. For example, in Figure 5a, the cycle is equal to 6. Within this cycle, 
the R-D characteristics of each object is accumulated and bit allocation is then 

20 
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performed. Overall, this techniques suffers from three main problems: (1) delay is 
introduced to collect the R-D values, (2) the actual R-D values are obtained 
through a simulated coder, and (3) the VOP-rates for each sequence are chosen by 
some other method. 

5 

With respect to the first problem, restricting the range of observation time can 
reduce delay. This is actually what is done for multiple video sequences with no 
periodic structure, i.e., when the super-frame period is infinite or the VOP-rates are 
unconstrained. However, this limitation in observation time requires the ability to 

10 predict the R-D characteristics for future objects having different complexity and 
size. Given that this can be done, the need to collect the actual R-D values is no 
longer required. Finally, if these R-D values contain information about the 
distortion for non-coded objects, then there is no need to choose the fixed VOP- 
rates for each sequence beforehand. This assumes some a priori knowledge about 

15 objects in video sequences. 

Our method solves all of the above problems. Similar to Equation 13 without the 
restrictions on the bit-rate and buffer size, the problem for the constrained case as 
shown in Figure 5a can formally be stated as, 
20 ^S^V^] %„ /s] (QJs>0) > ( 17 > 

where Q is a matrix of quantization parameters for each video object plane (object) 
coded at various time instants within the time interval (f,-,f /+/i ] , and/ s denotes the 

time duration of a periodic cycle. The length of this cycle is a parameter itself and 
is dependent on the individual skip parameter for each object, specified by 6. 

25 
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With variable VOP-rates for each object, we do not assume that each object will be 
coded at every time instant within the specified interval. Therefore, zero values are 
placed in Q to denote time instants that a particular object remains uncoded. In the 
example shown in Figure 5a, 0_ T = [1,2, 3f = 6, and Q would be a 3x6 matrix with 
5 7 out of the 18 being zero elements. 

In order to satisfy the VOP-rate requirements for the constrained case, in general, 
i.e., the VOP-rates of all of the objects must lead to a periodic structure, we require 
that 

10 l=LCF(e)<f msx , (18) 

where LCF(6) denotes the least common factor among the VOP-rates 0. 



^ To further define the constraints on the bit-rate and buffer size, we let M denote the 

M= set of all objects and 7} denote the set of time indices for which an object j is 

Q15 coded. For example, in Figure 5a, T 0 = { 1, 2, 3, 4, 5, 6}, Tj - {2, 4, 6}, and T 2 = {3, 

U 6 } . Then, the constraint on the rate is expressed as, 

y.. jeM neTj 

which essentially says that the sum of the bit-rates for all objects, at all time 
instants within the specified time interval, must be less that the calculated bit-rate 
20 budget over that time interval. 

To define the constraints on the buffer size, we let L= \J {7}} denote the complete 

jeM 

set of coded indices. Also, given l & L, we let l 0 equal the previous value of / except 
when / is the first element in L; in that case, l 0 = 0. Then, we defined the set of 
25 buffer constraints as, 
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(20) 

+ 2 *,('«)-('-4>)-*^. >0; v/e l 

ye M, 

where M; denotes the set of objects that are coded at index /. The above conditions 
ensure that buffer overflow and underflow are avoided at every coded time instant. 



5 In one embodiment, we solve the minimization given in equation 17 that is subject 
to constraints given by equations 18-20 by first breaking the main problem into 
smaller sub-problems. In this way, each object has its own sub-problem and can be 
solved using the frame-based optimization discussed above. Using the solutions to 
each sub-problem as input, we then consider the global solution. Of course, this 

10 can be accomplished through several iterations. 

In an alternative embodiment, we first allocate a target number of bits to each 
object, and then determine the skip (f s ) and quantizer (Q) parameter for each object 
separately. The initial rate allocation can be based on the previous rate-distortion 
15 characteristics of each object. As with the above approach, individual solutions for 
each object need to be re-considered in light of the constraints on individual skip 
factors, and overall bit-rate and buffer size constraints. 

In yet another alternative embodiment, the problem is directly solved globally by 
20 searching over all valid combinations of skip and quantization parameters. This 
may be done in an iterative manner as above, where several choices for f s are 
considered. The main difference is that the vector d can have numerous 
possibilities for each VOP-skip parameter f s . Therefore, all valid possibilities for 0 
need to be evaluated in each case. 

25 
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This invention is described using specific terms and examples. It is to be 
understood that various other adaptations and modifications may be made within 
the spirit and scope of the invention. Therefore, it is the object of the appended 
claims to cover all such variations and modifications as come within the true spirit 
and scope of the invention. 
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